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Abstract. - An exact solution to a family of parity check error-correcting codes is provided by 
mapping the problem onto a Husimi cactus. The solution obtained in the thermodynamic limit 
recovers the replica symmetric theory results and provides a very good approximation to finite 
systems of moderate size. The probability propagation decoding algorithm emerges naturally 
from the analysis. A phase transition between decoding success and failure phases is found 
to coincide with an information-theoretic upper bound. The method is employed to compare 
Gallager and MN codes. 



The theory of error-correcting codes concentrates on the efficient introduction of redundancy 
to given messages for protecting the information content against corruption. The theoretical 
foundations of this area were laid by Shannon's seminal work M and have been developing 
ever since (see and references therein). One of the main results obtained in this field is 
the celebrated channel coding theorem stating that there exists a code such that the average 
message error probability Pg, when maximum likelihood decoding is used, is upper bounded 
by Pe < e~ M Et - R \ where M is the length of the encoded transmission and R — ( message 
information content )/M is the code rate. The exponent E(R) is positive for code rates below 
the channel capacity, corresponding to the maximal mutual information between the received 
and the transmitted signals, and vanishes above it. For rates R below the channel capacity, 
commonly termed Shannon's bound, the error probability can be made arbitrarily small. 

The channel coding theorem is based on unstructured random codes and impractical de- 
coders as maximum likelihood || or typical sets [Bj. In the last fifty years several practical 
methods have been proposed and implemented, but none has been able to saturate Shan- 
non's bound. In 1963 Gallager Q proposed a coding scheme which involves sparse linear 
transformations of binary messages that was forgotten soon after in part due to the success of 
convolutional codes Q and the computational limitations of the time. Gallager codes have been 
recently rediscovered by MacKay and Neal (MN) that independently proposed a closely related 
code H . This almost coincided with the breakthrough discovery of the high performance turbo 
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codes || . Variations of Gallager codes have displayed performance comparable (and sometimes 
superior) to turbo codes 0, qualifying them as state-of-the-art codes. 

Statistical physics has been applied to the analysis of error-correcting codes as an alternative 
to information theory methods yielding some new interesting directions and suggesting new 
high-performance codes j7| . Sourlas was the first to relate error-correcting codes to spin glass 
models [|, showing that the Random Energy Model (REM)||, [h], [TTJ] can be thought of as 
an ideal code capable of saturating Shannon's bound at vanishing code rates. This work was 
extended recently to the case of finite code rates [12, n3| and has been further developed 
for analysing MN codes of various structures |l4[ 15, [L6|. All of the analyses mentioned 
above as well as the recent turbo codes analysis [ |17| relied on the replica approach under the 
assumption of replica symmetry. It is also worthwhile mentioning a different approach, used 
in the analysis of convolutional codes IbSf , of employing the transfer-matrix formalism and 
power series expansions. However, to date, the only model that can be analysed exactly is the 
REM that corresponds to an impractical coding scheme of a vanishing code rate. 

In this letter we present an exact analysis to the performance of Gallager error-correcting 
codes on a generalisation of Bethe lattices known as Husimi cactus . We solve the model 
recovering results obtained by the replica symmetric theory and finding the noise level that 
corresponds to the phase transition between perfect decoding and a decoding failure phase, this 
appears to coincide with existing information-theoretic upper bounds. We experimentally show 
that the solution accurately approximates Gallager codes of moderate size. We also show that 
the probability propagation (PP) decoding algorithm emerges naturally from this framework 
allowing for the analysis of the practical decoding performance. Finally, we summarise the 
differences between Gallager and MN codes, which are somewhat obscure in the information 
theory literature but become explicit in this framework. 

We will concentrate here on a simple communication model whereby messages are repre- 
sented by binary vectors and are communicated through a Binary Symmetric Channel (BSC) 
where uncorrelated bit flips appear with probability /. A Gallager code is defined by a binary 
matrix A = [Ci | C2], concatenating two very sparse matrices known to both sender and 
receiver, with C2 (of dimensionality (M — N) x (M — N)) being invertible - the matrix Ci is 
of dimensionality (M — N) x N. 

Encoding refers to the production of a M dimensional binary code word t G {0, 1} M 
(M > N) from the original message £ G {0, 1}^ by t = G T £ (mod 2), where all operations 
are performed in the field {0, 1} and are indicated by (mod 2). The generator matrix is G = 
[I I C^Ci] (mod 2), where J is the N x N identity matrix, implying that AG T (mod 2) = 
and that the first N bits of t are set to the message In regular Gallager codes the number 
of non-zero elements in each row of A is chosen to be exactly K. The number of elements 
per column is then C = (1 — R)K, where the code rate is R = N/M (for unbiased messages). 
The encoded vector t is then corrupted by noise represented by the vector £ G {0, 1} with 
components independently drawn from P(() = (1 — f)S(C) + fS(( — 1). The received vector 
takes the form r = G T £ + C (mod 2). 

Decoding is carried out by multiplying the received message by the matrix A to produce 
the syndrome vector z — Ar = A£ (mod 2) from which an estimate t for the noise vector 
can be produced. An estimate for the original message is then obtained as the first iV bits of 
r + f (mod 2). The Bayes optimal estimator (also known as marginal posterior maximiser, 
MPM) for the noise is defined as fj — argmax^P^ | z). The performance of this estimator 

can be measured by the probability of bit error pi, = 1 — 1/M 'Y^f—x Cj]i where <$[;] is 
Kronecker's delta. Knowing the matrices C2 and C±, the syndrome vector z and the noise 
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Fig. 1. - First step in the construction of Husimi cactus with K — 3 and connectivity C = 4. 



level / it is possible to apply Bayes' theorem and compute the posterior probability 

P(t \ z ) = i x [ z = Ar(mod 2)] P(r), (1) 

where x[X] is an indicator function providing 1 if X is true and otherwise. To compute 
the MPM one has to compute the marginal posterior P(rj \ z) = P{t \ z)> which in 

general requires 0(2 M ) operations, thus becoming impractical for long messages. To solve this 
problem one can use the sparseness of A to design algorithms that require 0(M) operations to 
perform the same task. One of these methods is the probability propagation algorithm (PP), 
also known as belief propagation, sum-product algorithm (see [po[ ) or generalised distributive 
law 

The connection to statistical physics becomes clear when the field {0, 1} is replaced by Ising 
spins {±1} and mod 2 sums by products The syndrome vector acquires the form of a 
multi-spin coupling = Ylje£(u) Cj where j = 1, • • • , M and fi = 1, • • • , (M — N). The K 
indices of nonzero elements in the row \x of A are given by = {ji, • ■ ■ , jx}, and in a 

column I are given by — {fjti, • ■ • , He}- 

The posterior (|]) can be written as the Gibbs distribution : 

P(t\J) = i lim exp[-pHf,(T;J)] (2) 

Zi p—*oo 
M-N 

H p (t;J) = - 

The external field corresponds to the prior probability over the noise and has the form 
F = atanh(l — 2f). Note that the Hamiltonian itself depends on the inverse temperature 
f3. The disorder is trivial and can be gauged as i— > 1 by using Tj >— * TjQ. The resulting 
Hamiltonian is a multi-spin ferromagnet with finite connectivity in a random field hj = 
fi^FQ. The decoding process corresponds to finding zero temperature local magnetisations 
rrij — lim^^ 00 (Tj) / 3 and calculating estimates as Tj — sgn(mj). 

In the {±1} representation the probability of bit error, acquires the form 

1 1 M 

P b =2~ wE^J' s S n ( m j), (3) 
i=i 

connecting the code performance with the computation of local magnetisations. 

A Husimi cactus with connectivity C is generated starting with a polygon of K vertices with 
one Ising spin in each vertex (generation 0). All spins in a polygon interact through a single 
coupling and one of them is called the base spin. In figure ^ we show the first step in the 
construction of a Husimi cactus, in a generic step the base spins of the n— 1 generation polygons, 
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numbering (C — 1)(K — 1), are attached to K — 1 vertices of a generation n polygon. This 
process is iterated until a maximum generation nmax is reached, the graph is then completed 
by attaching C uncorrelated branches of n max generations at their base spins. In that way 
each spin inside the graph is connected to exactly C polygons. The local magnetisation at the 
centre rrij can be obtained by fixing boundary (initial) conditions in the 0-th generation and 
iterating recursion equations until generation n m ax is reached. Carrying out the calculation 
in the thermodynamic limit corresponds to having n ma x ~ hxM generations and M — > oo. 

The Hamiltonian of the model has the form (0) where C(jjl) denotes the polygon /j, of the 
lattice. Due to the tree-like structure, local quantities far from the boundary can be calculated 
recursively by specifying boundary conditions. The typical decoding performance can therefore 
be computed exactly without resorting to replica calculations pl| . 

We adopt the approach presented in JlSf ] where recursion relations for the probability 
distribution P^s^Tfe) for the base spin of the polygon \i is connected to (C — 1)(K — 1) 
distributions Pvj(Tj), with v € M{j) \ M (all polygons linked to j but /x) of polygons in 
the previous generation: 



1 



P^k{Tk) = jj Tr {Tj} exp 



p ( ^ n t j ■ - 1 ) + Frk 

je£(/*)\fc 



n n p »i<n)> w 

ueM(j)\nje£.(iJ,)\k 



where the trace is over the spins Tj such that j 6 C(fi) \ k. 

The effective field x V j on a base spin j due to neighbours in polygon v can be written as : 



exp (— 2x 



J2F 1 vj 



i'3 I 



Combining (^) and (^J) one finds the recursion relation: 



exp(-2x li k) = 



By computing the traces and taking f3 — > oo one obtains: 



Tr {rj} exp 




^E J e£( Al )\fe(- F - 


1" Yl v eM{j)\iJ. Xv i) T o 


Tr {Tj} exp 


+0Jn rije£( M )\fe T 3 ' 


^Y,3ec(p)\ki F ' 





(5) 



(6) 



X^k 



atanh 



J„ Y[ tanh(i^+ 



(J) 



The effective local magnetisation due to interactions with the nearest neighbours in one branch 
is given by m w = tanh(x w -). The effective local field on a base spin j of a polygon \x due to C— 1 
branches in the previous generation and due to the external field is x^j — F + J2^£M(j)\^ ®vi'i 
the effective local magnetisation is, therefore, m^j — tanh(a; At j). Equation (|7]) can then be 
rewritten in terms of rh^j and m MJ - and the PP equations fl [L2L EG] can be recovered: 



m, lk = tanh F 



E 



atanh {rh v k) 



T'fik 



J M I| m A 

jec(n)\k 



(8) 



Once the magnetisations on the boundary (0-th generation) are assigned, the local mag- 
netisation rrij in the central site is determined by iterating (^) and computing : 



tanh I F + atanh (f 

veM{j) 



(9) 
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Fig. 2. - (a) Mean normalised overlap between the actual noise vector £ and decoded noise r for K = 4 
and C = 3 (therefore R — 1/4). Theoretical values (□), experimental averages over 20 runs for code 
word lengths M = 5000 (•) and M = 100 (full line), (b) Transitions for K = 6. Shannon's bound 
(dashed line), information theory upper bound (full line) and thermodynamic transition obtained 
numerically (o). Theoretical (O) and experimental (+, M = 5000 averaged over 20 runs) PP decoding 
transitions are also shown. In both figures, symbols are chosen larger than the error bars. 



The free energy can be obtained by integration as (||) represents extrema of a free energy 

By applying the gauge transformation i— » 1 and Tj i— > TjCj; assigning the probability 
distributions Pq (x) to boundary fields and averaging over random local fields FC one obtains 
from (Q) the recursion relation in the space of probability distributions P(x) f23|j : 



. C-i 



Pn(x) 
Pn-l(x) 



J J[ dxi P„_i(xi) ^ 
. K-l 

j n p ™- i ( x j) s 



C-l 

i=i 

(K-X 
atanh I ] J tanh(a 



(10) 



where P n (%) is the distribution of effective fields at the n-th generation due to the previous 
generations and external fields, in the thermodynamic limit the distribution far from the 
boundary will be Poc(x) (generation n — > oo). The local field distribution at the central site is 
computed by replacing C — 1 by C in ( |Io| ) , taking into account C polygons in the generation 
just before the central site, and inserting the distribution Pqq (x) . Equations ( |i"(i| ) are identical 
to those obtained by the replica symmetric theory as in [ful fL5l n6| . 



By setting initial (boundary) conditions Pq(x) and numerically iterating (10), for C > 3 one 
can find, up to some noise level f s , a single stable fixed point at infinite fields, corresponding 
to a totally aligned state (successful decoding) . At f s a bifurcation occurs and two other fixed 
points appear, stable and unstable, the former corresponding to a misaligned state (decoding 
failure). This situation is identical to that one observed in JIJ, |l5|, In terms of the 

local fields distribution P n {x), the aligned state corresponds to a runaway wave travelling to 
x(n) — > co with n being the time variable. The misaligned state corresponds to a stable wave 
located at x(n) ~ 0(1). Representing the distributions ( |Io|) by the first cummulants only, one 
can obtain a rough approximation in terms of one dimensional maps showing a bifurcation at 
some noise level f s , this approach will be further exploited elsewhere. 

The practical PP decoding is performed by setting initial conditions as m w - = 1 — 2/ to 
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Table I. - Gallager versus MN codes 





Gallager 


MN 


dynamical variables 


M 


N+M 


constraints 


M-N 


M 


unbiased messages coding 


for all K 


K =1,2 


Shannon's bound 


K— > oo 


K> 3 and unbiased messages 



correspond to the prior probabilities and iterating (g) until stationarity or a maximum number 
of iterations is attained || . The estimate for the noise vector is then produced by computing 
Tj = signfmj). At each decoding step the system can be described by histograms of the 
variables this is equivalent to iterating ([n]) (a similar idea was presented in [|[ |(|). Below 
f s the process always converges to the successful decoding state, above f s it converges to the 
successful decoding only if the initial conditions are fine tuned, in general the process converges 
to the failure state. In Fig.||a we show the theoretical mean overlap between actual noise £ and 
the estimate r as a function of the noise level / as well as results obtained with PP decoding. 

Information theory provides an upper bound for the maximum attainable code rate by 
equalising the maximal information contents of the syndrome vector z and of the noise estimate 
t p| p6| . The thermodynamic phase transition obtained by finding the stable fixed points of 
(|L0[) and their free energies interestingly coincides with this upper bound within the precision 
of the numerical calculation. Note that the performance predicted by thermodynamics is not 
practical as it requires 0(2 M ) operations for an exhaustive search for the global minimum of 
the free energy. In Fig.||b we show the thermodynamic transition for K = 6 and compare with 
the upper bound, Shannon's bound and f s values. 

The geometrical structure of a Gallager code defined by the matrix A can be represented by 
a bipartite graph {Tanner graph) p(| with bit and check nodes. Each column j of A represents 
a bit node and each row [i represents a check node, A^j = 1 means that there is an edge linking 
bit j to check /x. It is possible to show that for a random ensemble of regular codes, the 
probability of completing a cycle after walking I edges starting from an arbitrary node is upper 
bounded by V[l;K, C, M\ < l 2 K l /M. It implies that for very large M only cycles of at least 
order InM survive. In the thermodynamic limit M — > oo the probability V[l;K, C, M] — > for 
any finite I and the bulk of the system is effectively tree-like. By mapping each check node to 
a polygon with K bit nodes as vertices, one can map a Tanner graph into a Husimi lattice that 
is effectively a tree for any number of generations of order less than InM. It is experimentally 
observed that the number of iterations of (|J) required for convergence does not scale with 
the system size, therefore, it is expected that the interior of a tree-like lattice approximates 
a Gallager code with increasing accuracy as the system size increases. FigJ^a shows that the 
approximation is fairly good even for sizes as small as M = 100. Note that although the 
local magnetisations nij for a loopy graph are not generally expected to converge to the values 
computed in a tree, sgn(mj) seems to do so. A thorough discussion on this respect for some 
specific graphical models can be found in [25j| . 

In H MacKay and Neal introduced a variation on Gallager codes termed MN codes. The 
main difference between these codes is that for MN codes the syndrome vector contains also 
information on the original message in the form z = C s £ + C n C- The message itself is directly 
estimated and there is no need for recovering the noise vector. MacKay has formulated and 
proved a number of theorems simultaneously for both codes using the fact that if both message 
and noise are sampled from the same distribution, these codes can be formulated as the same 
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estimation problem, to say, finding the most probable vector x that satisfies z = Ax, given 
the matrix A and a prior distribution P(x). Using statistical physics, we previously analysed 
MN codes [II], . It is interesting to note that in spite of the similarity between the two 
codes, there are some important differences in their dependence on the parameters K and C. 
In particular, Shannon's bound is only attainable by Gallager codes if K — > oo, in contrast 
to results obtained for MN codes. Decoding of unbiased messages is generally possible with 
Gallager codes, but successful convergence is only guaranteed (in the thermodynamic limit) 
for K = 1 , 2 in the MN codes. We outlined those differences in table Q. 

To summarise, we solved exactly, without resorting to the replica method, a system rep- 
resenting a Gallager code on a Husimi cactus. The results obtained are in agreement with 
the replica symmetric calculation and with numerical experiments carried out in systems 
of moderate size. The framework can be easily extended to MN and similar codes. We 
believe that methods of statistical physics are complimentary to those used in the statistical 
inference community and can enhance our understanding of general graphical models beyond 
error-correcting codes. 
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